Multi-Site Data Collection for a Spoken Language Corpus

نویسنده

  • Lynette Hirschman
چکیده

This paper describes a recently collected spoken language corpus for the ATIS (Air Travel Information System) domain. This data collection effort has been co-ordinated by MADCOW (Multi-site ATIS Data COllection Working group). We summarize the motivation for this effort, the goals, the implementation of a multi-site data collection paradigm, and the accomplishments of MADCOW in monitoring the collection and distribution of 12,000 utterances of spontaneous speech from five sites for use in a multi-site common evaluation of speech, natural language and spoken language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi - Site Data Collection and Evaluationin Spoken Language

The Air Travel Information System (ATIS) domain serves as the common task for DARPA spoken language system research and development. The approaches and results possible in this rapidly growing area are structured by available corpora, annotations of that data, and evaluation methods. Coordination of this crucial infrastructure is the charter of the Multi-Site ATIS Data COllection Working group ...

متن کامل

Multi-Site Data Collection and Evaluation in Spoken Language Understanding

The Air Travel Information System (ATIS) domain serves as the common task for DARPA spoken language system research and development. The approaches and results possible in this rapidly growing area are structured by available corpora, annotations of that data, and evaluation methods. Coordination of this crucial infrastructure is the charter of the Multi-Site ATIS Data COllection Working group ...

متن کامل

Expanding the Scope of the ATIS Task: The ATIS-3 Corpus

The Air Travel Information System (ATIS) domain serves as the common evaluation task for ARPA"spoken language system developers. 1 To support this task, the Multi-Site ATIS Data COllection Working group (MADCOW) coordinates data collection activities. This paper describes recent MADCOW activities. In particular, this paper describes the migration of the ATIS task to a richer relational database...

متن کامل

Benchmark Tests For The Darpa Spoken Language Program

This paper documents benchmark tests implemented within the DARPA Spoken Language Program during the period November, 1992 January, 1993. Tests were conducted using the Wall Street Journal-based Continuous Speech Recognition (WSJ-CSR) corpus and the Air Travel Information System (ATIS) corpus collected by the Multi-site ATIS Data COllection Working (MADCOW) Group. The WSJ-CSR tests consist of t...

متن کامل

Vague Language and Interpersonal Communication: An Analysis of Adolescent Intercultural Conversation

This paper is concerned with the analysis of the spoken language of teenagers, taken from a newly developed specialised corpus the British and Taiwanese Teenage Intercultural Communication Corpus (BATTICC). More specifically, the study employs a discourse analytical approach to examine vague language in an intercultural context among a group of British and Taiwanese adolescents, paying particul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992